12 research outputs found

    Ensemble Clustering for Biological Datasets

    Get PDF

    Tabu Search: A Comparative Study

    Get PDF

    Clustering Network Data Using Mixed Integer Linear Programming

    Get PDF
    Network clustering provides insights into relational data and feeds certain machine learning pipelines. We present five integer or mixed-integer linear programming formulations from literature for a crisp clustering. The first four clustering models employ an undirected, unweighted network; the last one employs a signed network. All models are coded in Python and solved using Gurobi solver. Codes for one of the models are explained. All codes and datasets are made available. The aim of this chapter is to compare some of the integer or mixed-integer programming network clustering models and to provide access to Python codes to replicate the results. Mathematical programming formulations are provided, and experiments are run on two different datasets. Results are reported in terms of computational times and the best number of clusters. The maximum diameter minimization model forms compact clusters including members with a dominant affiliation. The model generates a few clusters with relatively larger size. Additional constraints can be included to force bounds on the cluster size. The NP-hard nature of the problem limits the size of the dataset, and one of the models is terminated after 6 days. The models are not practical for networks with hundreds of nodes and thousands of edges or more. However, the diversity of models suggests different practical applications in social sciences

    Recent Applications in Data Clustering

    No full text
    Clustering has emerged as one of the more fertile fields within data analytics, widely adopted by companies, research institutions, and educational entities as a tool to describe similar/different groups. The book Recent Applications in Data Clustering aims to provide an outlook of recent contributions to the vast clustering literature that offers useful insights within the context of modern applications for professionals, academics, and students. The book spans the domains of clustering in image analysis, lexical analysis of texts, replacement of missing values in data, temporal clustering in smart cities, comparison of artificial neural network variations, graph theoretical approaches, spectral clustering, multiview clustering, and model-based clustering in an R package. Applications of image, text, face recognition, speech (synthetic and simulated), and smart city datasets are presented

    A Comparison of Heuristics with Modularity Maximization Objective using Biological Data Sets

    No full text
    Finding groups of objects exhibiting similar patterns is an important data analytics task. Many disciplines have their own terminologies such as cluster, group, clique, community etc. defining the similar objects in a set. Adopting the term community, many exact and heuristic algorithms are developed to find the communities of interest in available data sets. Here, three heuristic algorithms to find communities are compared using five gene expression data sets. The heuristics have a common objective function of maximizing the modularity that is a quality measure of a partition and a reflection of objects’ relevance in communities. Partitions generated by the heuristics are compared with the real ones using the adjusted rand index, one of the most commonly used external validation measures. The paper discusses the results of the partitions on the mentioned biological data sets

    A Minimum Spanning Tree Based Clustering Algorithm for High throughput Biological Data

    Get PDF
    A new minimum spanning tree (MST) based heuristic for clustering biological data is proposed. The heuristic uses MSTs to generate initial solutions and applies a local search to improve the solutions. Local search transfers the nodes to the clusters with which they have the most connections, if this transfer improves the objective function value. A new objective function is defined and used in the heuristic. The objective function considers both tightness and separation of the clusters. Tightness is obtained by minimizing the maximum diameter among all clusters. Separation is obtained by minimizing the maximum number of connections of a gene with other clusters. The objective function value calculation is realized on a binary graph generated using the threshold value and keeping the minimumpercentage of edges while the binary graph is connected. Shortest paths between nodes are used as distance values between gene pairs. The efficiency and the effectiveness of the proposed method are tested using fourteen different data sets externally and biologically. The method finds clusters which are similar to actual ones using 12 data sets for which actual clusters are known. The method also finds biologically meaningful clusters using 2 data sets for which real clusters are not known. A mixed integer programming model for clustering biological data is also proposed for future studies

    Gene Coexpression Network Comparison via Persistent Homology

    No full text
    Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies

    Supply chain management and optimization in manufacturing

    No full text
    This book introduces general supply chain terminology particularly for novice readers, state of the art supply chain management and optimization issues and problems in manufacturing. The book provides insights for making supply chain decisions, planning and scheduling through supply chain network. It introduces optimization problems, i.e. transportation of raw materials, products and location, inventory of plants, warehouses and retailers, faced throughout the supply chain network

    Uncovering Dynamic Brain Reconfiguration in MEG Working Memory n-Back Task Using Topological Data Analysis

    No full text
    The increasing availability of high temporal resolution neuroimaging data has increased the efforts to understand the dynamics of neural functions. Until recently, there are few studies on generative models supporting classification and prediction of neural systems compared to the description of the architecture. However, the requirement of collapsing data spatially and temporally in the state-of-the art methods to analyze functional magnetic resonance imaging (fMRI), electroencephalogram (EEG) and magnetoencephalography (MEG) data cause loss of important information. In this study, we addressed this issue using a topological data analysis (TDA) method, called Mapper, which visualizes evolving patterns of brain activity as a mathematical graph. Accordingly, we analyzed preprocessed MEG data of 83 subjects from Human Connectome Project (HCP) collected during working memory n-back task. We examined variation in the dynamics of the brain states with the Mapper graphs, and to determine how this variation relates to measures such as response time and performance. The application of the Mapper method to MEG data detected a novel neuroimaging marker that explained the performance of the participants along with the ground truth of response time. In addition, TDA enabled us to distinguish two task-positive brain activations during 0-back and 2-back tasks, which is hard to detect with the other pipelines that require collapsing the data in the spatial and temporal domain. Further, the Mapper graphs of the individuals also revealed one large group in the middle of the stimulus detecting the high engagement in the brain with fine temporal resolution, which could contribute to increase spatiotemporal resolution by merging different imaging modalities. Hence, our work provides another evidence to the effectiveness of the TDA methods for extracting subtle dynamic properties of high temporal resolution MEG data without the temporal and spatial collapse
    corecore